SPLIT: QoS-Aware DNN Inference on Shared GPU via Evenly-Sized Model Splitting

Published in ICPP 23 (CCF B), 2023

Recommended citation: Diaohan Luo, Yuewen Wu, Tian Yu, Tao Wang, Heng Wu, Wenbo Zhang. (2023). SPLIT: QoS-Aware DNN Inference on Shared GPU via Evenly-Sized Model Splitting. In: 52nd The International Conference on Parallel Processing. ICPP 2023. https://link.springer.com/chapter/10.1007/978-3-031-30637-2_31

This paper is about how to optimal ML performance by splitting models.

Download paper here

Recommended citation: Heng Wu. (2023). “Diaohan Luo, Yuewen Wu, Tian Yu, Tao Wang, Heng Wu, Wenbo Zhang. (2023). InstantChain: Enhancing Order-Execute Blockchain Systems for Latency-Sensitive Applications. In: 52nd The International Conference on Parallel Processing (ICPP 2023). ICPP 2023. “ ICPP. 605-614